Unsupervised Quality Estimation for Neural Machine Translation
نویسندگان
چکیده
منابع مشابه
Unsupervised Neural Machine Translation
In spite of the recent success of neural machine translation (NMT) in standard benchmarks, the lack of large parallel corpora poses a major practical problem for many language pairs. There have been several proposals to alleviate this issue with, for instance, triangulation and semi-supervised learning techniques, but they still require a strong cross-lingual signal. In this work, we completely...
متن کاملImproving Machine Translation Quality Estimation with Neural Network Features
Machine translation quality estimation is a challenging task in the WMT evaluation campaign. Feature extraction plays an important role in automatic quality estimation, and in this paper, we propose neural network features, including embedding features and cross-entropy features of source sentences and machine translations, to improve machine translation quality estimation. The sentence embeddi...
متن کاملTree Kernels for Machine Translation Quality Estimation
This paper describes Uppsala University’s submissions to the Quality Estimation (QE) shared task at WMT 2012. We present a QE system based on Support Vector Machine regression, using a number of explicitly defined features extracted from the Machine Translation input, output and models in combination with tree kernels over constituency and dependency parse trees for the input and output sentenc...
متن کاملAdaptive Quality Estimation for Machine Translation
The automatic estimation of machine translation (MT) output quality is a hard task in which the selection of the appropriate algorithm and the most predictive features over reasonably sized training sets plays a crucial role. When moving from controlled lab evaluations to real-life scenarios the task becomes even harder. For current MT quality estimation (QE) systems, additional complexity come...
متن کاملUnsupervised Tokenization for Machine Translation
Training a statistical machine translation starts with tokenizing a parallel corpus. Some languages such as Chinese do not incorporate spacing in their writing system, which creates a challenge for tokenization. Moreover, morphologically rich languages such as Korean present an even bigger challenge, since optimal token boundaries for machine translation in these languages are often unclear. Bo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Transactions of the Association for Computational Linguistics
سال: 2020
ISSN: 2307-387X
DOI: 10.1162/tacl_a_00330